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(S) Cache memory. 

A microprocessor is provided with an inte- 
gral, two level cache memory architecture. The 
microprocessor includes a microprocessor 
core and a set associative first level cache t>oth 
located on a common semiconductor die. A 
replacement cache, which is at least as large as 
approximately one half the size of the first level 
cache, Is situated on the same semiconductor 
die and is coupled to the first level cache, in the 
event of a first level cache miss, a first level 
entry is discarded and stored in the replace- 
ment cache. When such a first level cache miss 
occurs, the replacement cache Is ched^ed to 
see if the desired entry is stored therein. If a 
replacement cache hit occurs, then the hit entry 
is forwarded to the first level cache and stored 
therein. If a cache miss occurs in both the first 
level cache and the replacement cache, then a 
main memory access is commenced to retrieve 
the desired entry. In that event, the desired entry 
retrieved from main memory is forwarded to the 
first level cache and stored therein. When a 
replacement cache entry is removed from the 
replacement cache by the replacement 
algorithm associated therewith, that entry is 
written back to main memory If that entry was 
modified. Otherwise the entry is discarded. 
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This invention relates in general to memory archi- 
tectures for computer systems and, more particularly, 
to cache memories for use with computer processors. 

Processors often tal<e several clock cycles to ac- 
cess data which is stored in a main memory located 
external to the processor. Not only do these external 
memory accesses require a significant amount of 
time, these accesses also consume a significant 
amount of power. Cache memories have often been 
used to enhance computer system performance by 
providing a processor with a relatively small, high 
speed memory (or cache) for storing instructions and 
data which have recently been accessed by the proc- 
essor. These Instructions and data are stored in the 
cache In the hope that, since they have been ac- 
cessed once, they will be accessed again relatively 
soon. The speed or access time of the cache memory 
is substantially faster than that of the external nrain 
memory. By retrieving an instruction or data from the 
cache when a cache hit occurs rather than accessing 
the slower external main memory, significant time 
can be saved in the retrieval of the desired inlbnma- 
tion. 

A recent trend has been to integrate a first level 
(L1) cache on the microprocessor chip together with 
the microprocessor core as shown in FIG. 1. In this 
particular example, the microprocessor chip has been 
provided with a level 1 cache (L1) located on the chip 
and a level 2 cache (1-2) located external to the micro- 
processor chip. The on-chip L1 cache includes both 
an L1 instruction cache and an L1 data cache. The L1 
caches and the L2 cache are coupled via physical ad- 
dress and physical data buses to the external main 
memory in this example. The off-chip L2 cache Is typ- 
ically orders of magnitude larger than the on-chip LI 
caches. For example, 4 Kbyte on-chip L1 caches and 
256 Kbyte - 512 Kbyte off-chip L2 external caches are 
common. 

In a typical cache arrangement, the second level 
L2 cache includes all first level cache entries as sut>- 
sets. In other words, all of the entries of the first level 
L1 caches are also stored In the second level L2 
cache. In this manner, accesses to the 12 cache do 
not have to inspect the L1 caches unless there is an 
indication that the requested instruction or data is 
also stored In the L1 caches. 

Both "direct mapped" and "associative" caches 
are known to Increase merrary performance. In direct 
mapped caches, a partteular block or line of informa- 
tion can only be stored in a single location in the cache 
according to the block-frame address of the block or 
line. In a "fully associative" cache, the block can be 
placed anywhere within the cache, whereas in a "set- 
associative" cache the block is restricted to be stored 
in certain sets of storage locations. In a 2-way set as- 
sociative cache, each set in the cache can store 2 
blocks of information, in a 4-way set associatWe 
cache, each set in the cache can store 4 blocks of in- 



formation. Cache performance generally increases 
with increased associativity. However, Increased as- 
sociativity tends to require caching circuits of in- 
creased complexity. 

5 A "miss cache" is described by Norman P. Jouppi 

in his publication entitled 'Improving Direct-Mapped 
Cache Performance By The Addition Of A Small Fully- 
Associative Cache And Prefetch Buffers", IEEE Sev- 
enth Annual Symposium On Computer Architecture, 

10 1990. The described miss cache is a smalt, fully as- 
sociative cache which is located between a first level 
direct-mapped cache and its refill path. If a miss oc- 
curs in the direct-mapped cache but a hit occurs in the 
miss cache, then significant time is saved by avoiding 

15 an access to main memory. Such miss caches are 
typically very small and hold 2-5 entries or blocks in 
one example. 

Jouppi also describes an improvement to miss- 
caching, namely the "victim cache". A victim cache is 

20 a small, fully associative cache as described with ref- 
erence to the miss cache, except that the small fully 
associative cache (victim cache) is loaded with the 
victim of the miss instead of the requested block. In 
other words, when a cache miss occurs in the direct 

25 mapped LI cache, the block or "victim" that is discard- 
ed from the LI cache is stored in the victim cache. 
The victim caches described by Jouppi typically hold 
1-5 entries. The goal of Jouppi's victim cache is to in- 
crease the performance of a direct mapped first level 

30 cache to a level approximating the performance of a 
set associative cache by the addition of a small (1-5 
entry) fully associative victim cache. The victim 
cache contains only entries that have recently been 
kicked out of the direct mapped first level cache. From 

35 the above it Is seen that the goal of the Jouppi victim 
cache is to increase the performance of a direct map- 
ped cache. 

With the ever increasing demand on menrtory for 
faster access which is caused by processors with 
40 higher clock speeds and larger appetites fbr instruo- 
tk>ns and data, even faster cache memory systems 
than those presently available are clearly desirable. 

We will describe a cache menrKtry system with an 
improved performance of a first level, set associative 
46 memory cache. 

We will describe a cache memory system with in- 
creased cache performance while avoiding undue In- 
creases In the amount of chip area consumed by the 
cache. 

50 We will describe a cache memory system with an 
increase In the performance of split instruction/data 
first level caches without an increase In size of such 
first level caches. 

The described cache memory system conserves 
55 power. 

In accordance with one embodiment of the pres- 
ent invention, a microprocessor is provided Including 
a semiconductor die and a microprocessor core situ- 
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ated on the semiconductor die. The microprocessor 
also Includes a first level set associative cache situ- 
ated on the semiconductor die and coupled to the mi- 
croprocessor core. The first level cache exhibits a 
predetermined byte size sufficiently large to store a 
predetermined number of information entries. In one 
embodiment, the first level cache is a split instruction 
cache - data cache. The microprocessor further in- 
cludes a replacement cache situated on the semicon- 
ductor die and coupled to the first level cache. The re- 
placement cache stores information entries which are 
discarded from the first level cache as a result of 
cache misses in the first level cache. The replace- 
ment cache is at least as large as approximately one 
half the size of the first level cache. For highest per- 
formance at a given dock rate, both the first level 
cache and the replacement cache are set associative 
caches. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The features of the invention believed to be novel 
are specifically set forth in the appended daims. 
However, the invention itself, both as to its structure 
and method of operation, may best be understood by 
referring to the following description and accompany- 
ing drawings. 

FIG. 1 is a blodc diagram of a conventional cache 
memory architecture Including first and second level 
cache memories. 

FIG. 2 is a block diagram of a replacement cache 
memory architecture in accordance with the present 
invention. 

FIG. 3A is a flowchart showing the operation of 
the replacement cache architecture during a memory 
read operation. 

FIG. SB is a flowchart showing the operation of 
the replacement cache architecture during a memory 
write operation. 

FIG. 4 Is a block diagram of the first level linearly 
addressed Instructton cache employed by the present 
invention. 

FIG. 5 is a representation of an entry of the FIG. 
4 instruction cache along with the corresponding lin- 
ear and physical addresses. 

FIG. 6 is a block diagram of a linear tag array and 
a store array of the FIG. 4 instruction cache. 

FIG. 7 is a block diagram of a linearly addressed 
data cache employed by the present invention. 

FIG. 8 is a representation of an entry of the data 
cache of FIG. 7 along with the corresponding linear 
and physical addresses. 

FIG. 9 is a blockdiagramof a linear tag array and 
a data store array of the FIG. 7 data cache. 

FIG. 1 0 is a block diagram of a physical tag circuit 
employed by the present invention. 

FIG. 11 is a block diagram of a translation looka- 
side buffer employed by the present invention. 



FIG. 12 is a block diagram of an entry of the phys- 
ical tag circuit of FIG. 10 and an entry of the transla- 
tion lookaside buffer of FIG. 1 1 along with the corre- 
sponding linear and physical addresses. 
5 FIG 13 is a block diagram of the replacement 

cache employed by the present Invention. 

FIG. 14A is a physical address of an Instruction 
or data employed by the addressing scheme errv 
ployed by the memory architecture of the present in- 
to vention. 

FIG. 14B is a representation of an entry or line 
employed by the memory architecture of the present 
invention. 

IS I. Microprocessor Addressing 

Before discussing the invention in detail, it is 
helpful to have an understanding of the addressing 
scheme employed by a conventtonal microprocessor 

20 architecture such as the Intel X86 architecture. The 
X88 architecture is a microprocessor architecture 
which has gained widespread acceptance. This archi- 
tecture, first introduced in the iSde*^ microprocessor, 
is also the basic architecture of both the 1486*^ mtcro- 

25 processor and the Pentium™ microprocessor, all 
available from the Intel corporation of Santa Clara, 
California. The X86 architecture provides for three 
distinct types of addresses, namely a logk:al (i.e., vir- 
tual) address, a linear address and a physical ad- 

30 dress. 

The logical address represents an offset from a 
segment base address. The segment base address is 
accessed via a selector. More specifically, the selec- 
tor, which Is stored in a segment register, is an Index 

35 which points to a location in a global descriptor table 
(GDT). The GOT location stores the linear address 
corresponding to the segment base address. More 
discusston of linear addressing is found In the co- 
pending patent application entitled 'Linearly Ad- 

40 dressable Microprocessor Cache" by David B. Witt" 
(Serial No.148,381. filed 11/3/93, Atty Docket No. M- 
241 2US), the disclosure of which is Incorporated 
herein by reference. 

The translation k)etween logical and linear ad- 

45 dresses depends on whether the microprocessor is in 
Real Mode or Protected Mode. When the micropro- 
cessor is in Real Mode, then a segmentation unit 
shifts the selector left four bits and adds the result to 
the offset to form the linear address. When the micro- 

50 processor is in Protected Mode, then the segmenta- 
tion unit adds the linear base address pointed to by 
the selector to the offset to provkje the linear address. 

The physical address is the address which ap- 
pears on the address pins of the microprocessor and 

55 is used to physically address external memory. The 
physical address does not necessarily correspond to 
the linear address. If paging is not enabled then the 
32-blt linear address corresponds to the physical ad- 
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dress. If paging is enabled, then the linear address 
must t>e translated into the physical address. A paging 
unit, which is usually included as part of the micropro- 
cessor's memory management unit, perfonms this 
translation. 

The paging unit uses two levels of tables to trans- 
late the linear address into a physical address. The 
first level table is a Page Directory and the second 
level table is a Page Table. The Page Directory in- 
cludes a plurality of page directory entries; each en- 
try includes the address of a Page Table and Informa- 
tion about the Page Table. The upper 1 0 bits of the lin- 
ear address (A22 - A31 ) are used as an index to select 
a Page Directory Entry. The Page Table includes a 
plurality of Page Table entries; each Page Table entry 
Includes a starting address of a page frame and stat- 
istical information about the page. Address bits A12 - 
A21 of the linear address are used as an index to se- 
lect one of the Page Table entries. The starting ad- 
dress of the page frame is concatenated with the low- 
er 12 bits of the linear address to form the physical ad- 
dress. 

Because accessing two levels of table for every 
memory operatbn substantially affects performance 
of the microprocessor, the mennory management unit 
generally also includes a cache of the most recently 
accessed page table entries. This cache is called a 
translation lookaside buffer (TLB). The microproces- 
sor only uses the paging unit when an entry is not in 
the TLB. 

The first processor which conformed to the x86 
architecture and which also included a cache was the 
486 processor. The 486 processor employed an 8 
Kbyte unified cache. In contrast, the Pentium™ proc^ 
essor includes separate 8 Kbyte instruction and data 
caches. The 486 processor cache and the Pentium™ 
processor caches are accessed via physical address- 
es; however the functional units of these processors 
operate with logical addresses. Accordingly, when the 
functional units require access to these caches, the 
logical address must be converted to a linear address 
and then to a physical address. 

In microprocessor architectures other than the 
X86 architecture, it is known to use virtually ad- 
dressed caches (ie logical address) to eliminate the 
address translation time from a cache hit. However, 
because input/output devices (I/O) use physical ad- 
dresses, mapping is required for the I/O to interact 
with the cache. In these systems, there are generally 
only two levels of addressing, virtual and physk^l, 
and thus only a single transiatton Is required for the 
physically addressed I/O devices to interact with the 
virtually addressed cache. Additionally, with a virtu- 
ally addressed cache, every time a process is switch- 
ed, the virtual addresses refer to different physical 
addresses, and thus, the cache must be flushed as 
the virtually addressed cache entries are potentially 
invalid. Addittonally, with a virtually addressed cache, 



it is possible for two different virtual addresses to cor- 
respond to the same physical address. These dupli- 
cate addresses are called aliases and could result in 
two locations in a virtual cache having information 
5 from the same physical address, the Information in 
only one of the locations being modified. 

li. Microprocessor Cache Architecture Overview 

The following sets forth a description of the best 
mode contemplated for carrying out the invention. 
The description Is intended to be Illustrative of the in- 
vention and should not be taken to be limiting. 

FIG. 2 is a block diagram which depicts an inte- 
grated microprocessor 10 that uses the cache mem- 
ory architecture of the present Invention. Micropro- 
cessor 10 contains a mtoroprocessor core 15 which 
includes an Integer unit 20 for handling integer oper- 
ations and a floating point unit 25 for handling floating 
point operations. Microprocessor core 15 further in- 
cludes an instruction decoder 108 which decodes in- 
structions and a load/store unit 134 which supervises 
loading and storing operations. Microprocessor core 
15 is alternatively referred to as the central process- 
ing unit (CPU) of microprocessor 10. Microprocessor 
core 15 Includes a DATA port to which data are pro- 
vided for processing and an INSTRUCTION port to 
which instructions are provided for execution. 

In the preferred embodiment of the invention, mi- 
croprocessor core 15 is a superscalar processor. 
However, the Invention is applicable to other types of 
processors as well, for example scalar and vector 
processors. While in FIG. 1 only integer unit 20, FPU 
25, decoder 108 and load/store unit 134 are shown 
within microprocessor core 15, it should be under- 
stood that other functional units such as a branching 
unit which predicts branches in the program being 
executed and other functional units may also be In- 
cluded in core 15. 

Mk^roprocessor 10 Is an integrated processor in 
the sense that other computer components besides 
microprocessor core 15 are included together on the 
same semiconductor die 27 as microprocessor core 
1 5. For example a first level cache 30, indicated with- 
in dashed lines, is Incorporated in microprocessor 10 
on die 27. First level cache 30 includes a first level in- 
struction store array 180 and a first level data store 
array 312 which are situated on die 27 and respec- 
th^ely coupled to the INSTRUCTION port and DATA 
port of microprocessor core 15. 

It is noted that first level cache 30 is a split in- 
struction-data, set associative cache in this embodi- 
ment. The first level instructton store array 180 and 
first level data store array 312 are also coupled via an 
Internal address/data bus (IAD) 1 02 to a bus Interface 
unit 45. IAD bus 1 02 and bus interface unit 45 are both 
situated on die 27. Bus interface unit 45 is of the con- 
ventional type and includes a physical address port 
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and a data port which are coupled to corresponding 
ports on a main memory 50. Main memory 50 is lo- 
cated externally to microprocessor 10 and Is alterna- 
tively referred to as external memory. One or more In- 
put/output (I/O) devices 55 may also be coupled to 
bus interface unit 45 as shown in FIG. 2. 

First level cache 30 further includes a linear in- 
struction tag array 182 and a linear data tag array 
310. Linear instruction tag array 1 82 is an array which 
stores the linear addresses or tags which indicate 
those instructions which are presently stored in first 
level instruction store array 180. Linear data tag array 
310 is an array which stores the linear addresses or 
tags which indicate the data which are presently stor- 
ed in first level data store array 312. 

First level cache 30 also includes a physical in- 
struction tag array 390 and a physical data tag array 
392 which are coupled to instruction store array 180 
and data store array 31 2 via internal address/data bus 
1 02. Physical instruction tag array 390 is an array that 
stores the actual physical addresses or tags which in- 
dicate those instructions that are presently stored in 
first level instruction store array 180. Physical data 
tag array 392 is an array that stores the physical ad- 
dresses or tags which indicate the data that are pre- 
sently stored in first level data store array 312. 

First ievel instruction store an^y 1 80 and linear 
instruction tag array 182 together form an instruction 
cache 104 (indicated by dashed lines) within first level 
cache 30. First level data store array 312 and linear 
data tag array 310 together fonm a data cache 150 
within first level cache 30. 

A replacement cache 60 is coupled to internal ad- 
dress/data bus (IAD) bus 102 as shown in FIG. 2. Re- 
placement cache 60 is a unified cache in which both 
instructions and data are stored. Replacement cache 
60 includes a store array 65 and a tag array 70. In- 
structions and data are stored in store array 65 and 
the physical addresses of such instructions and data 
in main memory are stored as address tags in tag ar- 
ray 70. In this manner by scanning the tags In tag ar- 
ray 70, it can be determined whether replacement 
cache 60 contains a particular Instruction or piece of 
data. 

In one embodiment of the invention wherein first 
level instruction store array 180 is 16 Kbytes in size 
and first level data store array 312 is 8 Kbytes in size, 
a replacement cache 60 with a 32 Kbyte store array 
65 Is employed. In the preferred embodiment of the 
invention, the instruction cache and data cache of 
first level cache 30 and replacement cache are four- 
way set associative. Other embodiments of the inven- 
tion are contemplated however wherein the replace- 
ment cache employs other levels of set associativity 
or is direct mapped. Replacement cache 60 advante- 
geously increases the performance of the split first 
ievel caches, namely instruction cache 104 and date 
cache 150, without necessiteting an increase in the 



size of those caches. 

Unlilce many other cache structures, the entries 
of replacement cache 60 are not subsets of first level 
cache 30. Rather, the entries of replacement cache 

5 60 are discarded entries from first level cache 30 
which are lacked out of first level cache 30 according 
the particular replacement algorithm used by first lev- 
el cache 30. Replacement algorithms which may be 
used for the first level cache replacement algorithm 

10 Include conventional least recently used (LRU), least 
frequently used (LFU) and "random" replacement al- 
gorithms. The terms "entry", "block" and "line" and re- 
garded as being synonymous and having their con- 
ventional meaning with respect to cache technology. 

15 For purposes of this discussion, it is assumed 

that microprocessor 10 has been operating for a time 
sufficient for the first level caches, namely instruction 
store array 180 and date store array 312 to be filled 
with instructions and data, respectively. When a first 

20 level cache hit occurs, the requested instruction or 
date is contelned in instructk>n store array 180 or date 
store array 31 2. If the requested information is an In- 
struction, then the tegs in linear instruction teg array 
182, or alternatively, physical instructton tag array 

25 390, are scanned in parallel within each array. If the 
requested infbrmation Is date, then the tegs in linear 
date teg array 310, or alternatively, physical date teg 
array 31 0, are scanned in parallel within each array. 
More informatk)n as to such scanning of linear in- 

30 structlon teg array 182 and physical instruction teg 
array 390 is found in the copending patent application 
entitled "Linearly Addressable Microprocessor 
Cache" by DavM B. Witt" (Serial No. 146,381, filed 
11/3/93. Atty Docket No. M-2412US), the disclosure 

35 of which is incorporated herein by reference. Briefly 
however, it is noted that the linear tegs are scanned 
first for a hit or miss. If a miss occurs, then the phys- 
ical tegs are scanned for a hit or miss. It is noted that 
since the split first level instructton and date caches 

40 are linearly addressed, it is possible to have aliased 
copies. Aliased copies are infbrmation elements (in- 
struction or date elements) which have different lin- 
ear addresses but the same physical address. The lin- 
ear addressing scheme allows only one copy of the 

45 physical date to be present to avoid having multiple 
updates of the same physical address pending. Thus, 
checking t>oth the instruction and date physical tags 
assures that no aliased copy can exist for the desired 
physical address. If an aliased entry Is found to exist 

60 for the desired physical address, then the linear teg 
stored in the linear (instruction or date) teg array is 
overwritten with the new linear teg corresponding to 
the matohed physical teg in the physical (instruction 
or date) teg array. In other words, the linear teg Is 

55 overwritten with the new linear teg at the entry where 
the alias occurs. Since one information element is al- 
ways cached in the case of aliasing, an access is not 
made to replacement cache 60 in the case of an atlas 
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hit. If misses occur in the physical tag arrays, then the 
cache memory system checks the replacement tag 
array 70 which always contains entries that are not in 
the first level arrays. There are no aliasing concerns 
with respect to replacement cache 60 since replace- s 
ment cache 60 employs physical tags and physical 
addressing. 

Returning now to a discussion of cache hits and 
misses in first level cache 30, rt is noted that if a match 
occurs between the address of the requested instruc- io 
tton or data information and a tag in one of the first lev- 
el cache tag arrays, then a first level cache miss has 
occurred. In this event, the addressed instruction or 
data Is retrieved from the appropriate corresponding 
instruction store array or data store array and provid- i5 
ed to microprocessor 15 for processing. 

However, if no match occurs between the ad- 
dress of the requested instruction or data information 
and a tag in the first level instruction and data cache 
tag arrays, then a first level cache miss has occurred. 20 
When a first level cache miss occurs, the requested 
Instruction or data is in neither first level instruction 
store array 1 80 nor first level data store array 312. In 
this event, the replacement tags in replacement tag 
array 70 of replacement cache 60 are scanned to see 25 
If the requested instruction or data Is contained in re- 
placement cache 60. 

If a match is found between the physical address 
of the requested instruction or data and a replace- 
ment tag in replacement tag array 70, then a replace- 30 
ment cache hit has occurred and the requested infor- 
mation is contained within store array 65 of replace- 
ment cache 60. In this situation, the block within re- 
placement cache 60 which contains the addressed in- 
struction or data is retrieved from replacement store 35 
array 65. This retrieved information is provided to the 
first level cache for storage therein and ultimately to 
microprocessor 15 for processing. The retrieved in- 
formation (ie. the entry or block containing the re- 
trieved Information) is transmitted to first level cache 40 
30 for storage in the appropriate storage array there- 
of. More partknjiarly, If the retrieved Informatton Is an 
entry containing an Instruction, then the retrieved In- 
formation Is stored in f iret level Instruction store array 
1 80. If the retrieved information is an entry containing 4S 
data, then the retrieved information is stored in f iret 
level data store array 312. 

In the case of a replacement cache hit, the hit en- 
try (the entry for which a hit occurs) from replacement 
cache 60 is stored at a location in the f iret level cache 50 
according to its address as per the set-associative 
nature of the f iret level cache. Moreover, the hit entry 
is stored at a location within the first level cache which 
is considered to be available as per the particular re- 
placement cache algorithm selected for first level 55 
cache 30. For example, a least recently used (LRU), 
least frequently used (LFU) or random replacement 
algorithm may be employed as the replacement algo- 



rithm of set associative f iret level cache 30. 

When the retrieved information is transmitted to 
first level cache 30 for storage, an entry or block of 
information from cache 30 is discarded from cache 
30. This information is denoted as discard information 
or as the discard information entry and is determined 
by the particular type of replacement algorithm se- 
lected for f iret level cache 30. The discard Information 
entry is transmitted from firet level cache 30 to re- 
placement cache 60 for storage as an entry thereof. 

If there Is a replacement cache miss, then main 
memory 50 is accessed and the requested informa- 
tion is retrieved therefrom. The entry retrieved from 
main memory 50 is transmitted to f iret level cache 30 
and stored therein. The entry is then provided to mi- 
croprocessor core 15 for processing by core 15. In re- 
sponse, a discard entry is ejected from first level 
cache 30. The discard entry is transmitted to replace- 
ment cache 60 and is stored in replacement store ar- 
ray 65. 

To conserve power in microprocessor 10, re- 
placement cache 60 is not clocked until It is time to 
access replacement cache 60. in other words, re- 
placement cache 60 is not clocked until it is time to 
scan tag array 70 of replacement cache 60 for hits. 
Replacement cache 60 remains in an Idle, power-con- 
serving state until that time. A dock circuit 72 pro- 
vides a reference clock or time base signal (CLOCK) 
to microprocessor core 15 and first level cache 30 as 
shown. This CLOCK signal is also supplied to dock 
control circuit 74 which supplies an RCLOCK signal 
to replacement cache 60 to provkie docking for cache 
60. At Input 74A. dock control circuit 74 receives first 
level miss Information from firet level cache 30 to in- 
dicate when a miss has occurred in first level cache 
30. Clock control circuit 74 generates an idle 
RCLOCK clock signal (a clock signal with no dock 
pulses) until it receives first level miss information. 
When clock control circuit 74 receives firet level miss 
information, dock control circuit 74 generates an 
RCLOCK signal with active dock pulses which cause 
replacement cache 60 to be docked. In response, re- 
placement cache 60 becomes active and draws pow- 
er. When the access to replacement cache 60 is com- 
plete as indicated to dock control input 74B by a conrv 
pleta signal from replacement cache 60, then the 
pulses of the RCLOCK signal cease. In this nnanner, 
clocking of replacement cache 60 ceases and power 
conservation again commences and continues until 
the next replacement cache access. 

A flowchart showing the operation of the replace- 
ment cache architecture during a menrK)ry read or 
load operation Is shown in FIG. 3A. Prior to actually 
commencing a load operation, replacement cache 60 
is placed in an idle state wherein replacement cache 
60 is not clocked as per block 500. This action con- 
serves power. Replacement cache 60 stays In this 
idle, power conserving state until replacement cache 
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60 is actually accessed. When a load Instruction is 
decoded at decision , block 502, a load operation is 
commenced at start load operation block 504. The 
first level tag arrays are then scanned as per block 
505. More particularly, If the requested information is s 
an instruction then linear instruction tag array 182 is 
scanned, whereas if the requested information Is data 
then linear data tag array 310. The address of the re- 
quested infbmriation is compared with the first level 
tags which are scanned as per block 510. A test is io 
then conducted at decision block 515 to determine if 
any of such tags match the address of the requested 
information. If there is a match between the address 
of the requested information and the scanned first 
level linear tags, then the requested information Is re- is 
trieved from the appropriate first level store array (in- 
struction store array 180 or data store array 31 5) and 
transmitted to microprocessor core 15. 

However, if there is no such match, then the ap- 
propriate physical tag array is scanned as per block 20 
525. More particularly, If the requested information is 
an instruction then physical instruction tag array 390 
is scanned, whereas if the requested information is 
data then physical data tag array 392 is scanned. A 
test is then conducted at decision block 530 to deter- 25 
mine if the address of the requested information 
matches any of the physical tags which are scanned. 
If such an address match is found, then an aliasing 
condition exists. When such an aliasing condition is 
found, the subject linear tag stored in the linear tag ar- 30 
ray is overwritten with the corresponding new linear 
tag of the request that matched the physical tag in the 
physical tag array as per block 535. The requested in- 
formation is then transmitted to microprocessor core 
15 as per block 540. It is noted that during the above 35 
described operations from block 500 to block 530 in- 
clusive, replacement cache 60 is not being clocked 
and thus power is conserved. 

However, returning to decision block 530, if it is 
determined that there is no match t>etween the ad- 40 
dress of the requested informatton and the scanned 
physical tags, then a first level cache miss has occur- 
red. In the event of such a first level cache miss, 
clocking of unified replacement cache 60 commenc- 
es as per block 545 and power starts to be drawn In 46 
replacement cache 60 as accessing of cache 60 be- 
gins. More particularly, replacement tag array 70 is 
scanned in block 550. A test is conducted at decision 
block 555 to determine if the address of the requested 
information in the read operation matches any of the so 
scanned replacement tags, if such a match is found, 
then a replacement cache hit has occurred. In the 
event of such a replacement cache hit, replacement 
cache 60 transmits the entry in replacement store ar- 
ray 65 which contains the requested information to ss 
first level cache 30 as per block 560. The transmitted 
entry is stored In first level cache 30 at a location de- 
termined by the particular set associative addressing 
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scheme selected for such cache as per block 565. 
The requested information is then transmitted from 
first level cache 30 to microprocessor core 15 as per 
block 570. At block 575, first level cache 30 kicks out 
a discard entry as per the particular replacement al- 
gorithm used by cache 30. The discard entry is stored 
in replacement cache 60 at a locatton determined by 
the particular set associative addressing scheme se- 
lected for such cache as per block 580. 

However, if no replacement tag match is found at 
decision block 555, then external main memory 50 is 
accessed to obtain the requested information for the 
subject read operation as per blocks 585 and 590. In 
this case the requested information, which is stored 
in main memory and which is sought in the subject 
read operatton, is designated as nnain memory infor- 
mation. The entry in main memory 50 which contains 
the requested information is retrieved from main 
memory 50 at block 590 and is transmitted to first lev- 
el cache 30 at block 595. This entry is then stored In 
first level cache 30 as per block 600. The requested 
main memory infonmation in this entry is transmitted 
from first level cache 30 to microprocessor core 15 at 
block 605. Process flow then continues to block 575 
at which the first level cache kicks out a discard entry 
and then to block 580 at which the discard entry from 
first level cache 30 is stored in replacement cache 60. 
Flow then continues back to the start load or other op- 
eratbn at block 500. 

It Is noted that as an alternative to the process 
flow depicted in FIG. 3A with respect to the afbremen^ 
tioned power conservatton feature, replacement 
cache 60 can be returned to the idle state immediate- 
ly after access to cache 60 is complete. In other 
words, docking of replacement cache 60 can cease 
after the replacement cache access is complete with- 
out waiting until process flow returns to cache idle 
block 500. 

Afiowchart showing the operation of the replace- 
ment cache architecture during a memory write or 
store operation Is depicted in FIG. 3B. Prior to actually 
commencing a store operation, replacement cache 
60 is placed in an idle state wherein replacement 
cache 60 is not clocked as per block 700. Power is 
conserved when replacement cache 60 is not 
clocked. Replacement cache 60 stays in this idle, 
power conserving state until replacement cache 60 Is 
actually accessed in the course of the write operation. 
When a store instructton is decoded at decision block 
702, a store operation Is commenced at start store op- 
eration block 704. 

A test Is then conducted at decision block 705 to 
determine If a linear tag hit has occunred. In other 
words, a test is performed to determine if the target 
address associated with the pending write operation 
matches any of the linear data tags in first level linear 
data tag array 310. If a match is found, then the data 
for the subject write operation Is written at block 710 
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to the entry in first level data store array 312 contain- 
ing the data associated with the write address. 

IHowever, if no such tag match is found, then the 
physical tags in physical data tag array 392 are scan- 
ned as per block 715. A test is conducted at decision s 
blodc 720 to determine if a physical tag hit has occur- 
red. More particularly, a check is made to determine 
if the target address of the pending write operation 
matches any of the tags of physical data tag array 
392. If such a match is found then an aliasing condl- io 
tion exists. In the case of aliasing, multiple linear ad- 
dresses are associated with the same physical ad- 
dress. In this event, the old linear tag Is overwritten 
with the new linear tag at block 725. The data asso- 
ciated with the pending write operation is then written is 
to the entry in first level data store array 312 which 
corresponds to the target address of the write opera- 
tion as per block 730. 

However, if there is no physical tag hit at decision 
block 720, then clocking of replacement cache 60 is 20 
resumed at block 732 and the replacement cache 
tags in replacement tag array 70 are scanned at block 
735. A test is then conducted at decision block 740 to 
determine if a replacement tag hit has occurred. In 
other words, a comparison is performed to see If the 25 
target address of the pending write operation matches 
any tag In replacement tag anray 70. If there is no such 
tag match, then a replacement cache miss has occur- 
red and the pending write operation is carried out 
upon external memory 50 as per block 745. However, 30 
if there is a replacement tag match which signifies 
that a replacement tag hit has occurred, then the 
pending write operation is carried out on the replace- 
ment cache. To accomplish this, an entry to be re- 
placed in the first level cache 30 is first allocated as 3S 
per block 750. A rotation is then performed at block 
755 whereby the replaced entry from first level cache 
30 is rotated or switched with the hit entry in replace- 
ment cache 60. In other words, the entry for which the 
hit occurred in replacement cache 60 is transferred to 40 
first level data store array 312 and the entry which 
was allocated for replacement in data store anray 312 
is transferred to, and stored In. replacement store ar- 
ray 65. The status of the entry thus written in replace- 
ment cache 60 is then updated to modified as per 4S 
block 760. 

A test is conducted at decision block 765 to de- 
termine If the prevk)us state was shared. If the previ- 
ous state was not shared, then the new state is modi- 
fied as per block 770. However, if the previous state so 
was shared, then the entry is written to main memory 
50 as per block 775. The new state is then regarded 
as exclusive as per block 780. 

The terms Modified, Exclusive, Shared and Inval- 
id as used herein conform to the well-known MESI ss 
protocol wherein modified, exclusive, shared and in- 
valid (valkl) bits are used to provkle status Informa- 
tion with respect to particular cache blocks or entries. 
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III. Detailed First Level and Replacement Cache 
Operation 

Returning to FIG. 2, first level data store array 
312 is coupled to load/store functional unit 134 of mi- 
croprocessor core 15 and to IAD bus 102. Physksal 
tag circuit 162 includes both physical instruction tag 
anray 390 and physical data tag array 392. Physical 
tag circuit 162 interacts with l>oth first level instruc- 
tion cache store array 180 and first level data cache 
store array 312 via IAD bus 1 02. In this particular enrv 
bodiment, instruction store array 180 and data store 
anray 312 are both linearly addressable caches. In- 
struction store array 180 and data store anray 312 are 
physically separate. However, both of these cache ar- 
rays are organized using the same architecture, i.e.. 
both caches Include a store array along with a corre- 
sponding tag array. 

Microprocessor 10 also includes memory man- 
agement unit (MMU) 164 and bus interface unit 45 
(BlU). Memory management unit 164 is coupled to 
IAD bus 102 and physical tag circuit 162. Bus inter- 
face unit 45 is coupled to physical tag circuit 162 and 
IAD bus 102 as well as an external microprocessor 
bus such as the 486 XL bus. 

Mteroprocessor 10 executes computer programs 
which include sequences of instructions. Computer 
programs are typically stored on a hard disk, floppy 
disk or other non-volatile storage media which are lo- 
cated in the computer system. When the program is 
run, the program is loaded from the storage media 
into a main memory 50 which is accessed by micro- 
processor 10 via bus interface unit 45. Once the in- 
structions of the program and associated data are in 
main memory 50, individual instructions are prepared 
for execution and ultimately executed by micropro- 
cessor 10. 

After being stored in main memory 50, the in- 
structions are passed via bus interface unit 45 to first 
level instruction store array 180, where the instruc- 
tions are temporarily held. Instruction decoder 108 
receives the instructions from instructton cache 104. 
Instruction decoder 108 examines the instructions 
and determines the appropriate action to take. For ex- 
ample, decoder 108 may determine whether a partic- 
ular instruction is a PUSH, POP. LOAD, AND, OR, EX 
OR. ADD, SUB, NOP. JUMP. JUMP on condition 
(BRANCH) or other instruction. Depending on which 
particular type of instruction that decoder 108 deter- 
mines is present, the instruction is dispatched to the 
appropriate functional unitof microprocessor core 15 
for that type. 

Referring to FIG. 4. instruction cache 104 is 
shown in more detail. Instruction cache 104 is a line- 
arly addressed, 16 Kbyte 4-way set associative 
cache. Each set includes 256 entries; each entry in- 
cludes a sixteen byte instructton block and a linear 
address tag. Instruction cache 104 includes cache 
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controller 170, address circuit 172» linear instruction 
tag array 182 and instruction store array 180. Cache 
controller 170 provides control signals to orchestrate 
the various operations of instruction cache 104. Ad- 
dress circuit 172 generates a linear fetch program s 
counter (FETCH PC) based upon a logical target pro- 
gram counter (LOGICAL PC) which is received from 
microprocessor core 1 5. Address circuit 172 also pro- 
vides address generation and X86 protection check- 
ing associated with pre-fetching Instructions from ex- io 
ternal menrwry. Based upon the current FETCH PC 
value, the instruction corresponding to the main 
memory address associated with that value is suc- 
cessively fetched from nrtain memory 50 as micropro- 
cessor core 1 5 progresses through the instructions of is 
a program in memory in the course of executing that 
program. In other words, instructions are prefetched 
from main memory and stored in instruction store ar- 
ray 1 80 prior to actually being called into microproces- 
sor core 15 by the advancement of the FETCH PC. 20 
Address circuit 172 functions as a translation circuit 
for translating between logical addresses and linear 
addresses. Instruction cache 104 stores instructions 
receh^ed via IAD bus 102. When microprocessor 10 
accesses main memory 50 as in the case of a cache 25 
miss in first level cache 30 and replacement cache 60, 
the accessed entry from main memory is stored In 
first level cache 30 for later use by microprocessor 
core 15 should that be necessary. The FETCH PC 
value is Incremented and continues to progress for- 30 
ward as successive instructions are retrieved from 
main memory and stored in first level cache 30. 

Instruction cache 104 is organized into two main 
arrays, namely instruction cache store array 180 and 
linear tag array 182. Instruction cache store array 180 35 
stores 16 byte lines or entries. Linear tag array 182 
stores the linear address tags corresponding to the in- 
structions. Each of these arrays is addressed by the 
linear FETCH PC address which is provided by ad- 
dress circuit 172. 40 

Referring to FIG. 5, the upper order bits of the lin- 
ear FETCH PC address 186 are compared to the tags 
stored within linear tag array 182; these bits are stor- 
ed as a linear tag when an entry is stored In instruc- 
tion store array 180. The middle order bits of the 45 
FETCH PC address 186 provide a cache index which 
is used to address a block within the anray and re- 
trieve an entry from the block of the array. The lowest 
order bits provide an offset in the retrieved entry from 
BYTEO of the Instructton block which is stored in in- so 
structlon store array 180; thus accessing the actual 
byte addressed by the FETCH PC address. 

Instruction cache entry 188 of cache 104 in- 
cludes linear address tag entry 190 and instruction 
entry 192. Instruction entry 192 includes a sixteen 55 
byte (IBYTEO- IBYTE 15) block of instructions. Linear 
address tag entry 190 Includes a linear tag value 
(LTAG), linear tag valid bit (TV), sixteen byte valid bits 
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(BVO - BV15) and valid physical translation bit (P). 
The linear tag value, which corresponds to the upper 
20 bits of the linear FETCH PC address, indicates the 
linear block frame address of a block that is stored in 
the corresponding store array entry. The linear tag 
valid value indicates whether or not the linear tag val- 
ue is valid. Each byte valid bit indicates whether the 
corresponding byte of the sixteen byte Instruction en- 
try is valid. The valid physical translation bit indicates 
whether or not an entry provides a successful phys- 
ical tag hit. 

Referring to FIG. 6, linear instruction tag array 
182 and instruction store array 180 of linearly ad- 
dressable instruction cache 104 are shown in more 
detail. It should be recalled that Instruction cache 104 
Is part of the split instruction-data first level cache 30. 
Instruction cache 104 Is arranged in four 4-Kbyte col- 
umns, column 0, column 1, column 2 and column 3, 
conresponding to the four sets of Instruction cache 
104. Instruction store array 180 Includes four sepa- 
rate store arrays, column 0 store array 200, column 1 
store array 201 , column 2 store anray 202 and column 
3 store anray 203 as well as multiplexer (MUX) circuit 
206. Multiplexer 206 receives column hit indication 
control signals fronri linear tag array 182 which indi- 
cate whether there was a match to a linear tag value 
stored In the linear tag array and provides the Instruo- 
tk>n which Is stored in one of the columns of the store 
arrays as output 

Address tag array 182 includes linear tag arrays 
210-213 corresponding to columns 0-3. Linear tag 
arrays 21 0 - 21 3 are organized with the same set and 
block configuration as store arrays 200 - 203. Linear 
tag arrays 210-213 each include a plurality of linear 
tag entries corresponding to the entries of respecth^e 
store arrays 200 - 203. Each linear tag array is cou- 
pled with a respective compare circuit 220 - 223 which 
provides a respective column hit Indication signal 
(COL HITO - COL HIT3). Accordingly, each column of 
instruction cache 104 includes a store array, a linear 
tag array and a compare circuit. Store arrays 200 - 
203, address tag arrays 210-211, and compare cir- 
cuits 220 - 223 all receive the linear address FETCH 
PC from address circuit 172. 

Referring to FIG. 7, a description of data cache 
1 50 with reference to the present invention is present- 
ed. Data cache 150 is a linearly addressed, 8 Kbyte 
4-way set associative cache. Each set of data cache 
150 Includes 128 entries; each entry Includes a sbc- 
teen byte block of informatton. (It is noted that if a 16 
Kbyte 4-way set associative cache were employed as 
data cache 150, then each set of data cache 150 
would includes 256 entries.) Data cache 150 Includes 
data cache controller 300, data store array 312 and a 
linear data tag array 310. Data cache controller 300 
provides control signals to orchestrate the various op- 
erations of data cache 1 50. Data cache controller 300 
receives control signals (CONTROL) from load store 
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section 134 as well as from IAD bus 102; data cache 
controller 300 provides control signals to cache array 
304. Data store array 312 stores data In blocks and 
provides the blocks of data when a particular block Is 
addressed. Data store array 312 is also coupled with 
IAD bus 102; in addition to the control signals from 
cache controller 300, data store array 312 receives 
address signals and data signals from load store sec- 
tion 134. 

Data cache 1 50 is organized into two arrays, data 
store array 312 and linear data tag circuit 310. Data 
cache array 312 receives and provides two data sig- 
nals (DATA A. DATA B) to load/store functional unit 
1 34. Linear data tag array 31 0 is addressed by two lin- 
ear addresses (ADDR A, ADDR B) which are provided 
by load/store functional unit 134; the two linear ad- 
dresses are also provided to data store array 312. Ac- 
cordingly, data cache 150 is a dual ported memory ar- 
ray, both ports being coupled with load/store func- 
tional unit 134 to allow two data values to be written 
or read simultaneously. Data store array 312 also re- 
ceives control signals from linear tag array 310. 

Referring to FIG. 6, the middle order bits of each 
linear address 319 provide a cache block index (IN- 
DEX) which is used to address a block within each col- 
umn of the linear tag arrays and retrieve an entry from 
each store array. The upper order bits of each linear 
address are compared to the linear data tags of each 
column of linear tag array 31 0, and thus select one of 
the columns which are accessed by the cache block 
Index. The lowest order bits of each linear address 
provide an offset (OFF) into the retrieved entry to ac- 
cess the actual byte addressed by the linear address. 

Data cache entry 320 of data cache 150 includes 
linear address tag entry 324 and data entry 322. Data 
entry 322 Includes a sixteen byte (DBYTEO - DBYTE 
15) block of data. Data linear address tag entry 324 
includes a data linear tag value (DTAG), linear tag val- 
id bit CTV). a data valkl bit (DV), and valid physical 
translation bit (P). The data linear tag value, which 
corresponds to the upper 21 bits of the linear address. 
Indicates the linear block frame address of a block 
which Is stored in the corresponding store array entry. 
The linear tag valid bit indicates whether or not the lin- 
ear tag is valid. The data valid bit Indicates whether 
or not a corresponding entry In store array is valid. 
The valid physical translation bit indicates whether or 
not an entry provides a successful physical tag hit 

Referring to FIG. 9, linear data tag array 310 and 
data store array 312 of linearly addressable data 
cache 150 are shown. Data cache 150 is arranged in 
four 2-Kbyte columns, namely, column 0, column 1, 
column 2, and column 3. The arrangement of linear 
data tag array 310 and data store array 312 Is similar 
to that of linear instruction tag array 1 82 and Instruc- 
tion store array 180. However, linear data tag anray 
310 simultaneously receives two linear addresses 
(ADDR A, ADDR B) and data store array 312 simulta- 



neously receives and provides two data signals 
(DATA A. DATA B), i.e.. data cache 150 functions as 
a dual ported data cache. 

Data store array 312 includes four separate data 
5 store arrays, column 0 store array 350, column 1 store 
array 351 , column 2 store array 352, and column 3 
store array 353 as well as multiplexer (MUX) circuit 
360. Multiplexer 360 receives control signals from lin- 
ear data tag array 310 which indicate whether there 
10 is a match to a linear tag value stored in a respective 
linear tag array. Multiplexer 360 receives and pro- 
vides the data to store arrays 350 - 353; multiplexer 
360 also receives and provides the data to IAD bus 
102 as well as load/store functional unit 134. 
IS Linear tag array cirouit 31 0 includes linear tag ar- 

rays 370 - 373 corresponding to columns 0-3. Each 
linear tag array Is coupled with a corresponding com- 
pare circuit 374 - 377. Accordingly, each column of 
data cache 150 includes a store array, a linear tag ar- 
20 ray and a compare circuit Store arrays 350 - 353, ad- 
dress tag arrays 370 - 373, and compare circuits 374 
- 377 all receive the linear addresses, ADDR A, ADDR 
B from load/store functional unit 134. 

Referring to FIG. 10, physical tag circuit 162 In- 
25 eludes Instruction physical tag array portion 390 and 
data physical tag array portton 392. Instruction phys- 
ical tag array portion 390 includes a plurality of in- 
struction physical tag arrays 400, 401, 402, 403 and 
a plurality of instruction compare circuits 404, 405, 
30 406, 407. Data physical tag array portion includes a 
plurality of data physical tag arrays 408, 409, 41 0, 411 
and a plurality of corresponding data compare circuits 
412, 413, 414, 415. Instruction physical tag arrays 
400 - 403 correspond to column 0 - 3 of instruction 
35 cache 104. Data physical tag arrays 408 - 411 con-e- 
spond to columns 0 - 3 of data cache 150. 

Instruction physical tag arrays 400 - 403 receive 
the least significant bits of the physical address that 
Is provided by bus interface unit 45 and provide a re- 
40 spective physical tag to compare circuits 404 - 407, 
which also receive the most significant bits of the 
same physical address. Compare circuits 404 - 407 
provide respecth^e instruction column hit Indication 
signals (IHIT CO - IHIT C3) to instruction store array 
45 180. These instruction column hit indication signals 
are provided to the HIT COL inputs of multiplexer 206 
(see FIG. 6) to control which column store array pro- 
vides an output instructk>n. 

Data physical tag arrays 408 - 411 receive the 
50 least significant bits of the physical address that is 
provided by bus interface unit 45 and provide a re- 
spective data physical tag to compare circuits 412 - 
415, which also receive the most significant bits of 
the same physical address. Compare circuits 412 - 
55 415 provide respective data column hit indication sig- 
nals (DHIT CO - DHIT C3) to data store array 312. 
These data column hit indication signals are provided 
to the HIT COL A inputs of multiplexer 360 (see FIG. 
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9) to control which column store array provides an 
output instruction. 

By providing physical tag arrays which are ac- 
cessed separately from the store arrays, cache 150 
is more efficient as it is not necessary to access store 5 
arrays 350 - 353, and thus to provide the power re- 
quired to access these arrays, to access the physical 
tags during bus watching operations. Moreover, fur- 
ther efficiencies are achieved since the linear to phys- 
ical address translation path Is not part of this speed io 
path. This is so because only linear tags are accessed 
directly from the microprocessor core. 

Referring to FIG.'s 5, 8, 10 and 12, physical tag 
arrays 400 - 403, 408 - 411 are organized with the 
same set and block relationship as their correspond- is 
ing linear tag arrays. In other words, instruction phys- 
ical tag arrays 400, 401 , 402, 403 each include a plur- 
ality of instruction physical tag entries corresponding 
to the entries of instruction linear tag arrays 210,211, 
212, 213 of instruction cache 104 and data physical 20 
tag arrays 408, 409, 410, 411 include a plurality of 
data physical tag entries corresponding to the entries 
of linear data tag arrays 370. 371, 372, 373 of data 
cache 150. Accordingly, each instruction physical tag 
entry 416 is conceptually included as part of instruo- 25 
tion entry 188 and each data physical tag entry 417 
is conceptually included as part of data entry 320. 

As seen in FIG. 12, each physical tag entry 416, 
417 includes a physical tag value (PTAG), a physical 
tag valid bit (PV), and a shared bit (S). Additionally, 30 
each data physical tag entry 417 also includes a 
modified bit (M), a cache disable bit (CD) and a write 
through bit(Wt). The physical tag value indicates the 
physical address after translation from the linear ad- 
dress of the physical address 41 8 which corresponds 35 
to the information which is stored in the correspond- 
ing entry of the corresponding store array. The phys- 
ical tag valid bit indicates whether or not the corre- 
sponding entry of the corresponding store array con- 
tains valid information. The shared bit indicates 40 
whether another cache elsewhere in a computer sys- 
tem of which processor 100 is a part has the same 
data. The modified bit indicates whether the data stor- 
ed in the store array has been modified (i.e., written 
to) and therefore is not consistent with the corre- 45 
sponding data stored externally of the cache. The 
cache disable bit indicates whether this particular en- 
try is cache disabled, i.e., cannot be stored in its re- 
spective cache. The write through bit indicates that 
when the entry is written to the cache, it should also so 
be written to the entr/s corresponding external nram- 
ory location. 

Referring to FIG. 11. memory management unit 
1 64 includes TLB array 420 as well as TLB compare 
circu it 422. TLB array 420 is organized as a 4 way set 55 
associative cache. Each set includes 32 entries to 
provide a total of 128 TLB entries. Memory manage- 
ment unit 164 functions as a translation circuit for 
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translating between linear addresses and physical 
addresses. 

Referring to FIG. 12, each TLB entry 430 of TLB 
164 includes a linear tag (LTAG) value and a physical 
tag (PTAG) value. The linear tag value corresponds to 
the most significant bits of a linear address 320 and 
physical tag value corresponds to the most significant 
bits of a physical address 418 that corresponds to lin- 
ear address 320. By concatenating the physical tag 
value with the lower order bits of the linear address 
320 which corresponds to the linear tag entry, the 
physical address is advantageously obtained without 
using two levels of page tables. 

FIG. 13 shows a detailed block diagram of re- 
placement cache 60. Replacement cache 60 includes 
a replacement mechanism 800, for example an LRU 
replacement nnechanism. Replacement mechanism 
800 implements a particular replacement strategy for 
replacement cache 60. For example, the replacement 
mechanism selected for replacement cache may be 
an LRU, LFU or a random replacement algorithm. 
When a replacement cache entry is renwved from the 
replacement cache by the replacement algorithm as- 
sociated therewith, that entry is written back to main 
memory if that entry was modified. Otherwise the en- 
try is discarded. 

Replacement mechanism 800 is coupled to a re- 
placement cache controller 805 which receives ac- 
cess requests at an input thereof. Cache controller 
805 is coupled to a 128 bit read/write bus 810. 
Read/write bus 810 is sufficiently wide to accommo- 
date a 16 byte cache block or entry. Read/write bus 
810 is coupled via an IAD interface (latch/driver) 815 
to internal address data (IAD) tnjs 102 as shown to 
enable address and data informatk>n to be written to 
and retrieved from replacement cache 60. 

As seen in FIG. 13, replacement cache 60 In- 
cludes store replacement array 65 and replacement 
tag array 70. Replacement cache 60 is a unified 
cache and thus store array 65 stores both instructions 
and data. Replacement cache 60 is arranged in four 
8 Ktyyte columns, namely column 0, column 1 , column 
2. and column 3 corresponding to the four sets of this 
four way set associative cache. Replacement store 
array 65 includes four separate store arrays, namely 
column 0 store array 820, column store array 821 , col- 
umn store array 822 and column store array 823. 
Each of store arrays 820, 821, 822 and 823 store 8 
Kbytes in this partteular 32 Kbyte replacement cache 
implementation. Each store array stores up to 512 en- 
tries, namely 512 16 byte entries or blocks in this 
case. Each of the store arrays stores a different one 
of the four sets of this four way set associative cache. 

Replacement store array 65 further Includes 
read/write (R/ W) interfaces 830, 831 , 832 and 833 re- 
spectively coupled to column 0 store array 820, col- 
umn 1 store array 821 , column 2 store array 822 and 
column 3store array 823 as shown. R/W interfaces 



11 



21 



EP 0 657 819 A2 



22 



830-833 receive oolumn hit information control sig- 
nals from repiacement tag array 70 which indicate 
whether there was a match to a tag stored in replace- 
ment tag array 70 and further provide the instruction 
or data which is stored in one of the columns of the 
store array as output More specifically, the column 
hit information control signals HIT COL 0. HIT COL 1 , 
HIT COL 2 and HIT COL 3 are provided to R/W Inter- 
faces 830, 831, 832 and 833, respectively (connec- 
tion not shown). In one embodiment of the invention, 
R/W interfaces 830-833 are implemented as a multi- 
plexer in a manner similar to that shown in FIG. 6. with 
respect to multiplexer 206. 

Replacement tag array 70 includes four separate 
tag arrays, namely column 0 tag array 840, column 1 
tag array 841, column 2 tag array 842 and column 3 
tag array 843. Each tag array is capable of storing 51 2 
16 byte tags, namely one tag per each entry of the 
store array for that oolumn. The separate tag arrays 
840-843 are all coupled to read/write bus 810 to re- 
ceive the index portion (bits 12:4) of the blocks pro- 
vided by R/W bus 810. The outputs of tag arrays 840- 
843 are coupled to respective comparators 850-853 
as shown. Each comparator is provided with the tag 
value associated with the pending read or write oper- 
ation. Comparators 850-853 perform compare oper- 
ations which indicate when hits occur In arrays 840- 
843, respectively. 

Referring now to FIG. 14A, the middle order bits 
(12:4) of physical address 860 provide a cache blocl< 
index (INDEX) which is used to address a block within 
each column of replacement tag arrays 840-843 and 
retrieve an entry from each store array. The upper or- 
der bits (31:13). namely the TAG VALUE, of each 
physical address 860 are compared to the tags of 
each column of replacement tag array 810, and thus 
select one of the columns which is accessed by the 
cache block index (INDEX). The lowest order bits 
(3:0) provide an offset (OFFSET) into the retrieved 
entry to access the actual byte addressed by physical 
address 860. 

FIG. 14B is a representation of the value stored 
in each replacement cache entry or block 865. Re- 
placement cache entry 865 includes address tag en- 
try 870 and data entry 875. Data entry 875 Includes 
a sixteen byte (RBYTEO, RBYTE1, ... RBYTE15) 
block of data. The value stored in each block also in- 
cludes a physical tag valid bit (V), a shared bit (S) and 
a modified bit (M). 

By way of summary, the operation of the memory 
cache architecture is now discussed in the situation 
where a miss in the replacement cache occure. For a 
replacement cache miss to occur, there must first be 
a miss in first level cache 30. In this example, it will 
be assumed that the pending operation is an instruc- 
tion read. However, this discussion applies in general 
to a data reads and writes as well. 

When an Instruction read Is pending, the first lev- 



el instruction cache linear tags are first accessed. It 
is assumed that there is no match between the re- 
quested address and the tags stored In the f iret level 
linear tag array. In this event, the linear address is 

5 translated to a physical address by the translation 
lookaside buffer array 420. The resultant physical tag 
is then checked for aliasing. If a tag match is still not 
found, then a first level cache miss has occurred, in 
the event of a first level cache miss, the replacement 

10 cache physical tags are accessed and checked for a 
tag match. If no match is found, then a replacement 
cache miss has occurred. 

When a replacement cache miss occurs, an ex- 
ternal memory access to main memory 50 is conduct- 

15 ed to obtain the addressed informatk>n. The entry in 
main memory 50 which contains the requested infor- 
mation is transmitted to first level cache 30 and is 
stored therein according the 4-way set associative 
addressing scheme of that cache. The particular en- 

20 try to be replaced in first level cache 30 Is then deter- 
mined according to the replacement algorithm em- 
ployed for cache 30. In this example, it is assumed 
that a random replacement algorithm is employed for 
first level cache 30. A random counter (not shown) 

25 picks from 0:3 and in this particular instance pteks 2. 
Thus entry number 2 is expelled from first level cache 
30 and is driven to replacement cache 60. Entry nuni- 
ber 2 Includes physical address information, a valid 
bit (V) and 16 bytes of store data. It is noted that if we 

30 were dealing with data, as opposed to the instruction 
of the present example, that entry number 2 would 
also include the state of the shared bits (S) and modi- 
fied bits (M). In the case of instructions however, the 
shared bit (S) and the modified bit (M) are always set 

35 to exclusive since instructions are not modified. 

The replacement cache then takes the physk^l 
address of the entry (discard entry) that was driven 
from the first level cache to the replacement cache 
and accesses its own 4-way set associative cache. In 

40 this particular example, it is assumed that the re- 
placement cache is employing a random replacement 
algorithm to determine replacement and that the ran- 
dom number this time is 3. Thus, entry number 3 is 
copied to a temporary latch (not shown) prior to being 

45 discarded. The discard entry from f iret level cache 30 
Is then written Into entry 3 of replacement cache 60 
as per the set associativity scheme selected for re- 
placement cache 60. In one embodiment of the inven- 
tion, if the replacement cache replacement algorithm 

60 selects entry 3 of replacement cache 60 as the entry 
to be discarded, the discard entry from the f iret level 
cache is written into replacement cache 60 at the 
same place that entry 3 used to occupy In the replace- 
ment cache prior to being discarded. 

55 The entry thus written into replacement cache 60 

includes a physical tag, valid bit (V), shared bit (S), 
modified bit (M) and 16 bytes of data. The status of 
the entry from replacement cache 60 that was copied 
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into the temporary iatch is then inspected. If the sta- 
tus of the entry in the temporary latch is shared, ex- 
clush^e or invalid it is discarded. If the status of the en- 
try In the temporary latch Is modified, then it is written 
back to main memory 50. s 

Further, by way of summary, the operation of the 
memory cache architecture is now discussed in the 
situation wherein a miss occurs in the first level cache 
and a hit occurs in the replacement cache. In this ex- 
ample. It is assumed that the pending operation Is a io 
data read (or load) operation. A data write operation 
is similar to the operation described. In this instance, 
first level cache 30 is accessed. The linear tags in the 
first level cache are checked and a miss is observed. 
The linear address associated with the read operation is 
Is then translated to the corresponding physical ad- 
dress via the translation lookaside buffer. Next, the 
physical Instruction/data tags are checked for an 
alias. A physical tag miss is then observed in the first 
level cache. 20 

In the event of a first level cache miss, the re- 
placement physical tags of the replacement cache are 
scanned for a match with the address informatton of 
the pending read operation. In this example it is as- 
sumed that there is a hit on column 1 of replacement 25 
cache 60, ie. a hit has occurred somewhere In set 1 
which is stored In replacement cache 60. When the re- 
placement cache hit occurs, hit informatk>n signifying 
that such a hit has occurred is communicated from re- 
placement cache 60 to first level cache 30. The first 30 
level cache then allocates an entry for later storage 
of the replacement cache hit entry according to the 4- 
way set associative addressing scheme used for first 
level cache 30. A random replacement algorithm is 
used In this example to determine the particular entry 3S 
of data store array 31 2 that will be discarded. For pur- 
poses of this example, it is assumed that an entry In 
set 0 will be discarded. An entry in set 0 of data store 
array 312 is then written into a holding register (not 
shown) as the discard entry. The discard entry in- 40 
eludes the set 0 physical address, data, shared, modi- 
fled and valid bits. The discard entry from first level 
cache 30 Is transmitted to replacement cache 60 and 
the requested entry (replacement cache hit entry) 
from replacement cache 60 is transmitted to first level 4S 
cache 30. Thus, a rotation occurs between replace- 
ment cache 60 and first level cache 30. 

Replacement cache 60 writes the discard entry in 
the holding register Into replacement store array 65 In 
the column that hit (column 1 in this example). This so 
write into the replacement store array includes tag, 
data, shared and modified bits. First level cache 30 
then takes the physical address of the entry rotated 
from replacement cache 60 to cache 30 and writes the 
physical tag of such rotated entry to the correspond- ss 
Ing physical tag array 392 In set 0 thereof. The linear 
tag associated with such rotated entry Is written to lin- 
ear data tag array 31 0 in set 0 thereof. The shared and 
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modified bits of the rotated entry from replacement 
cache 60 are written into the physical tag array 392, 
also in set 0 thereof. The valid information and actual 
data of the rotated entry from the replacement cache 
are stored in data store array 3i2 ias per the set as- 
sociative convention. Microprocessor 10 then contin- 
ues execution. 

In the preferred embodiment of the invention, the 
size of the replacement cache store array Is approxi- 
mately the same size as the sum of the sizes of the 
first level cache instruction store array and data store 
array or larger. For example, when a split instruction- 
/data store array is employed as first level cache 30, 
a 32 Kbyte replacement cache store array produces 
acceptable results when first level instruction store 
array Is 16 Kbytes and the first level data store array 
Is 8 Kbytes, in those cases where such split instruc- 
tion/data store arrays are used as first level cache 30, 
as opposed to a unified cache being used for cache 
30, the size of the first level cache is considered to be 
the sum of the sizes of the first level instruction and 
data store arrays.. While generally the size of the re- 
placement cache is approximately the same size as 
the size of the first level cache or larger, performance 
improvements are also achieved when the size of the 
replacement cache is within the range of approxi- 
mately 1/2 the size of the first level cache to approx- 
imately 8 times the size of the first level cache or larg- 
er. For example, in the case where the size of the first 
level cache is 16 Kbytes, the size of the replacement 
cache ranges from approximately 8 Kbytes to approx- 
imately 256 Kbytes or larger. A limiting factor on the 
upper end Is the availability of space on the micropro- 
cessor die. 

While a microprocessor apparatus and memory 
architecture therefor have been described above, it is 
clear that a method for operating such apparatus has 
also been disclosed. Briefly, the method Invoh/es a 
microprocessor including a microprocessor core and 
a first level cache coupled together and situated on a 
common semiconductor die. A main memory is situ- 
ated external to the microprocessor and is coupled to 
the microprocessor. The method of accessing menrv- 
ory includes the steps of providing a replacement 
cache situated on the semiconductor die and coupled 
to the first level cache, the replacement cache being 
at least as large as approximately one half the size of 
the first level cache. The method also includes the 
step of discarding an entry from the first level cache 
when a cache miss occurs in the first level cache, the 
entry thus discarded being designated the discard 
entry. The method further includes the step of storing 
the discard entry in the replacement cache for later 
use by the microprocessor core. The method also in- 
cludes a step wherein the replacement cache sup- 
plies a hit entry to the first level cache when a cache 
hit occure in the replacement cache. The method still 
further includes the step of storing the hit entry In the 
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first level cache for later use by the microprocessor 
core. 

The foregoing has described a microprocessor 
with an advanced cache memory system. The cache 
memory system of the present invention provides in- 
creased cache performance while avoiding undue in- 
creases in the amount of chip area consumed by the 
cache. The cache mennory system of the present in- 
vention also desirably reducing the number of access- 
es to external memory. Moreover, power is advanta- 
geously conserved by this advanced cache memory 
system. 

While only certain preferred features of the inven- 
tion have been shown by way of illustration, many 
modifications and changes will occur. It Is, therefore, 
to be understood that the present claims are intended 
to cover all such modifications and changes which fall 
within the true spirit of the invention. 



Claims 

1) A microprocessor comprising: 
a semiconductor die; 

a microprocessor core situated on said semi- 
conductor die; 

a first level set associative cache situated on 
said semiconductor die and coupled to said micropro- 
cessor core, said first level cache exhibiting a prede- 
termined byte size sufficiently large to store a prede- 
termined number of information entries; and 

a replacement cache situated on said semi- 
conductor die and coupled to said first level cache, 
said replacement cache storing information entries 
which are discarded from said first level cache as a 
result of first level cache misses, said replacement 
cache being at least as large as approximately one 
half the size of said first level cache. 

2) The microprocessor of claim 1 further compris- 
ing a replacement cache hit detection circuit, coupled 
to said replacement cache, for detecting when a hit 
occurs in said replacement cache and for supplying a 
hit entry from said replacement cache to said first lev- 
el cache when a hit occurs in said replacement cache. 

3) The microprocessor of claim 1 further compris- 
ing memory accessing means for accessing an exter- 
nal nrtemory to retrieve a desired information entry 
when t)oth a first level cache miss and a replacement 
cache miss occur. 

4) The microprocessor of dalm 1 wherein said re- 
placement cache comprises a set associative re- 
placement cache. 

5) The microprocessor of daim 1 wherein said re- 
placement cache comprises a 4 way set associative 
replacement cache. 

6) The microprocessor of daim 1 wherein said 
first level cache comprises a 4 way set associath^e 
cache. 



7) A microprocessor comprising: 
a semiconductor die; 

a microprocessor core situated on said semi- 
conductor die; 

5 a first level set associative cache situated on 

said semiconductor die and coupled to said micropro- 
cessor core, said first level cache exhibiting a prede- 
termined byte size sufficiently large to store a prede- 
termined number of infonmation entries; 

10 a first level cache hit detector, coupled to said 

first level cache, for detecting when a hit occurs in 
said first level cache and for discarding an entry from 
said first level cache as a discard information entry 
when a cache miss occurs in said first level cache. 

iS a replacement cache situated on said semi- 

conductor die and coupled to said first level cache, 
said replacement cache storing discard information 
entries which are discarded from said first level cache 
as a result of first level cache misses, said replace- 

20 ment cache being at least as large as approximately 
one half the size of said first level cache; and 

a replacement cache hit detector, coupled to 
said replacement cache, for detecting when a hit oc- 
curs in said replacement cache after a first level 

25 cache miss and for supplying a hit entry from said re- 
placement cache to said first level cache when a hit 
occurs in said replacement cache. 

8) The microprocessor of daim 7 wherein said 
first level cache indudes a first level instruction 

30 cache and a first level data cache. 

9) The microprocessor of daim 7 further compris- 
ing a clock controller for providing a docking signal to 
said replacement cache when said replacement 
cache is being accessed and for not providing said 

35 clocking signal to said replacement cache when said 
replacement cache is not being accessed, such that 
power is conserved. 

10) The microprocessor of claim 7 further conrv- 
prising a first level cache hit detection circuit, situated 

40 in said first level cache, for detecting when a hit oc- 
curs in said first level cache and for discarding an en- 
try from said first level cache as a discard Information 
entry when a cache miss occurs In said first level 
cache. 

45 11 ) The microprocessor of daim 7 combined with 

an external memory for providing instructions and 
data to said microprocessor. 

12) The microprocessor of daim 11 further com- 
prising memory accessing means for accessing said 

50 external memory when both a first level cache miss 
occurs and a replacement cache miss occurs to re- 
trieve a desired information entry from said external 
memory and send said desired information entry to 
said first level cache for storage therein. 

65 13) The microprocessor of daim 7 wherein said 

replacement cache comprises a set assodative re- 
placement cache. 

14) The microprocessor of daim 7 wherein said 
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replacement cache comprises a 4 way set associative 
repiacement cache. 

15) The microprocessor of daim 7 wherein said 
first level cache comprises a 4 way set associative 
cache. s 

16) In a microprocessor including a microproces- 
sor core and a first level cache coupled together and 
situated on a common semiconductor die, a main 
memory being situated external to said microproces- 
sor and coupled to said microprocessor, a method of io 
accessing memory comprising: 

providing a replacement cache situated on 
said semiconductor die and coupled to said first level 
cache, said replacement cache being at least as large 
as approximately one half the size of said first level is 
cache; 

discarding an entry from said first level cache 
when a cache miss occurs in said first level cache, the 
entry thus discarded being designated the discard 
entry; 20 

storing said discard entry in said replacement 
cache for later use by said microprocessor core; 

said replacement cache supplying a hit entry 
to said first level cache when a cache hit occurs in 
said replacement cache, and 25 

storing said hit entry in said first level cache for 
later use by said microprocessor core. 

17) The method of daim 16 further comprising 
the step of docking said replacement cache during 
those periods of time when said replacement cache 30 
is being accessed. 

18) The method of daim 17 further comprising 
the step of ceasing to clock said replacemeifit cache 
during those periods of time when sM replacement 
cache is not being accessed. 35 

19) The method of claim 16 wherein saM first lev- 
el cache is set associative. 

20) The method of daim 1 6 wherein saki replace- 
ment cache is set assodative. 

21 ) The method of daim 1 6 wherein saki f b-st lev- 40 
el cache is four-way set associative. 

22) The method of daim 16 wherein saki replace- 
ment cache is four-way set associath^e. 

45 
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